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Abstract 

The association between two random variables is often of primary interest in statisti- 
cal research. In this paper semiparametric models for the association between random 
vectors X and Y are considered which leave the marginal distributions arbitrary. Given 
that the odds ratio function comprises the whole information about the association the 
focus is on bilinear log-odds ratio models and in particular on the odds ratio parame- 
ter vector 6. The covariance structure of the maximum likelihood estimator 6 oi 6 is 
of major importance for asymptotic inference. To this end different representations of 
the estimated covariance matrix are derived for conditional and unconditional sampling 
schemes and different asymptotic approaches depending on whether X and/or Y has fi- 
nite or arbitrary support. The main result is the invariance of the estimated asymptotic 
covariance matrix of 9 with respect to all above approaches. As applications we compute 
the asymptotic power for tests of linear hypotheses about 9 — with emphasis to logistic 
and linear regression models — which allows to determine the necessary sample size to 
achieve a wanted power. 
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1 Introduction and Outline 

The question how a random output vector F of a system (e.g. the health status of a human) is 
associated to a random input vector X (e.g. consumption of tobacco and alcohol, environmental 
pollution and other risk factors) is of major importance in statistical science. If the association 
between X and Y is of primary interest, then a semi-parametric association is appropriate which 
leaves the marginal distributions of X and Y arbitrary. However, the association is completely 
determined by the odds-ratio function OR{x, y) for the joint density p{x, y) with respect to fixed 
reference values xq and y^ (cf. Osius 2004, 2009 OE]): 



(1.1) 
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A semi-parametric odds-ratio model specifies this function up to an unknown parameter vector 
9, but leaves marginal distributions arbitrary. An important class are log-bilinear odds-ratio 
models given by 

(1.2) logORix,y) = x^0y 

where x and y are known vector-valued functions of x and y which may coincide with x and 
y, respectively. The association structure of some widely used regression models is log-bilinear, 
e.g. generalized linear models with canonical link (for univariate Y), multivariate linear logistic 
regression (for Y with finite support) and multivariate linear regression. An advantage of odds- 
ratio models over these regression models is that inference about the association parameter 9 
may also be obtained from samples drawn conditionally on Y (instead of X). Generalizing an 
important result by Prentice and Pyke, 1979 |H], it has been shown in Osius, 2009 [B], that the 
estimator 9 and its estimated asymptotic covariance matrix CoVoo{9) for samples conditional 
on Y are exactly the same as if the sample had been drawn conditionally on X. The purpose 
of this paper is to derive different representations of this covariance matrix on which statistical 
analysis (e.g. tests and confidence regions) are based. These results are applied to compute the 
asymptotic power for tests of linear hypothesis about 9 which allows to determine the sample 
sizes necessary to achieve a wanted power. 

A given random sample (X^, 1^), i ~ 1, n containing J+1 different X- values -^^(0)7 ■ • ■ : ^(J) 
and K + 1 different F- values Y(o) , . . . , Yi^k) can be summarized by the counts 

(1.3) i?jfc = {z I X, =X(,), = Y(fe)} 

for the observed combinations (j, k). Although the distribution of the table [Rjk) depends on 
the sampling scheme (e.g. conditional on X or Y), we will show that the estimated asymptotic 
covariance matrix of 9 is invariant against common sampling schemes and asymptotic approaches. 
However we do not establish original asymptotic results here but — using mainly matrix algebra — 
derive different representations for asymptotic covariance matrices and in particular for CoVao{9)- 

The paper is organized as follows. Section 2 gives a brief introduction to odds ratio models with 
emphasis on multivariate linear logistic regression (where Y has finite support) and log-linear 
models for contingency tables (where the support of X is finite too). The next section 3 deals 
with estimation of 9 under different sampling schemes (unconditional and conditional on X and 
Y , respectively). Our main results are contained in section 4. Based on the work of Haberman, 
1974 [1] we first show that for contingency tables (i.e. both X and Y have finite support) 
the asymptotic distribution of 9 is invariant under the common sampling schemes and provide 
different representations of CoVrx,{9). Looking more generally at the multivariate linear logistic 
regression model (with arbitrary support of X) and sampling conditional on X we observe, that 
the estimated asymptotic covariance matrix CoVoo{9) is the same as for contingency tables (where 
X has finite support). The general case allowing arbitrary supports for X and Y is dealt with 
in section 5. For sampling conditional on Y and a fixed set of conditioning values we conclude 
that the matrix of Covoo{d) is the same as before where both X and Y had finite support. As 
a first application we show in section 6 how our results can be used to compute the asymptotic 
power for testing a linear hypothesis Q9 = and how to determine the necessary sample size to 
achieve a given power for a value 9' of interest under the alternative. Finally we demonstrate 
for univariate Y how the linear resp. log-linear model emerges from an odds-ratio-model by 
imposing additional assumptions on the conditional distribution of Y (given X) and conclude 
with a short discussion of our results. The appendix contains the proofs and some results from 
linear algebra. 
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2 Odds Ratio Models 

Consider a pair {X, Y) of random vectors defined on some probability space taking values in 
n = fix >^ fly C M*^^ X with joint distribution P and marginal distributions and P^ . 
To avoid trivialities we assume that f2x and fly both have more than one element. Let I'x and 
Vy be two fixed cr-finite measures on M*^^ and M*^^ such that P has a positive density p on f2 
with respect to the product measure i/ = i^x ^ vy — typically a product of Lebesgue or counting 
measures. The log-density can be parametrized as 

(2.1) \ogp{x,y) ^ a + p{x) +^{y) +'ipB{x,y), xe fix, ye fly 

with integrable functions p, 7, ip, an unknown parameter G O, and an integration constant a 
determined by J p dv = 1. To guarantee identifiability we assume the constraints 

(2.2) p{xo) = 7(2/0) = 

where xq £ fix and yo € fly are the reference values of the odds ratio function. The conditional 
distribution of Y given X has a positive density piylX = x) given by 

(2.3) \ogp{y\X = x)= -i{y) + %ke{,x,y)- 5e{x) 
with an integration constant 5e{x) and similarly 

(2.4) \ogp{x\Y = y)= p{x) + 'ip0{x,y)-e0{y). 

An important class of parametric association models are log-bilinear association models with 
respect to the transformed variables x = hx{x) and y — hyiy) given by measurable maps 
hx : K*^^ M^^ and hy : M*^^ M^^ which will always be chosen here such that 

2^0 = hx{xo) = and yo = hy{yo) = 0. The functions hx and hy are typically injective (one- 
to-one) but to avoid trivialities we merely assume that they are not constant. The parameter 9 
is a Lx X Ly-matrix and the log-odds ratio function is bilinear in the transformed variables x 
and y 

(2.5) iljg{x,y) — x^6y for all x, y. 

This model is semiparametric in the sense that it does not restrict the marginal distributions P^ 
and P^ except for reasonable moment conditions. More precisely, it has been shown by Osius, 
2009 [iB] Sec. 3], that given the marginal distributions P^ and P^, there exists for any Lx x Ly 



matrix 9 a unique joint distribution P with these marginals such that (2.5) holds — provided the 
expectations E(||/ix(-''^)|P) and E(| |/iy (F)| are finite, i.e the covariance matrices of hx{X) 
and hy{Y) exist, and this will be assumed throughout the paper. 

It will be convenient to interpret a, m x n matrix A as a vector A of length mn obtained by 
placing the columns of A one after another. Using the Kronecker product y(Eix (cf. appendix [b|) 



the model (2.5) may be rewritten as 
(2.6) ijjg{x,y) = {y®x)^9 for all x, 

Any submodel specified by a linear restriction of the form 9 = A^9* B with given matrices A, 
B and parameter matrix 9* yields a log-bilinear association too, with respect to h*-^ = Ahx, 
hy = Bhy . 
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The following examples reveal that the association structure of some widely used regression 
models is in fact log-bilinear. 

Example 1: Generalized linear models 

Let y be a univariate random variable and suppose that the conditional density of Y given 
X ~ X belongs to the exponential family 



(2.7) 



p{y\X = x) = cxp{(j) ^ [y ■ t{x) - b{T{x))] + c{y, (/))} 



with suitable functions b, c, r and a dispersion parameter 
(1989). Then the log-odds ratio function has the form 



compare McCuUagh and Nelder 



(2.8) 



tp{x,y)^<l> ^[t{x) - T(xa)] ■ [y - yo] 



and t{x) is a strictly monotone function of the conditional expectation fi{x) = E(y|X — x) 
6'(r(a;)). A generalized linear model with canonical link specifies the canonical parameter 



(2.9) 



t{x) 



where x E M.^^ is a known vector of formal covariates and a E 
parameters. The corresponding log-odds ratio function 



(3 E M^^ are unknown 



(2.10) 



■4>{x,y) = X Oy 



is of the form (2.6 1 with y — y and parameter 6 — 0^ /3. Note that the intercept a is no longer 
present in (2.10). Taking the log-bilinear association model (2.10) instead of (2.9) weakens 
the distributional assumption while still including the regression parameter /? up to a positive 
constant (f)^^. In particular a linear hypothesis QP — with a given matrix Q is equivalent to 
Q6 — 0, and for a vector c a one-sided hypothesis (3 < is equivalent to c^9 < 0. 

A closer look at the relationship between generalized linear models and log-bilinear odds ratio 
models is given in section 6.2. 

Example 2: Log-linear models for contingency tables 

An important special case of example 1 are log-linear models for for contingency tables. If X and 
Y have finite support fix = {2^0, • 



association model (|2.5| can be written as 
(2.11) 



tljjk{0) = xj9yk, with 



xj} and fly = {2/0, • • 
Xj = hxixj), 



or in matrix notation 

(2.12) ^p{e) = xeY'^E 



l,JxK 



X = (xji) E 



• JxLj, 



Then (2.1 1 reduces to a log-linear model for the probabilities pjk 
(2.13) logpjk ^ a + pj +jk + xJOyk- 

with po = 7o = 0. 



yji} say, then the log-bilinear 
jjk = hviyk), 

Y^{yk^)ER'' 
= p{xj, yk) namely 



Using Kronecker products the model (2.11) resp.(2.12l can be written as 
-,T a 



(2.14) 



^jk{0) = zjk^ with Zjk ^jjk® Xj E 

^{9) = Ze with Z = Y ®X E 



L = LxLy resp. 
I^{J+l){K + l) 
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Note that the "interaction covariate" Zjk is the vector representation of the Ly x Lx matrix 
ykij ■ The parameter 6 will be identifiable if and only if X has rank Lx and Y has rank Ly, 
i.e. Z has rank L, and this will always be assumed. 

The saturated log-linear model imposes no restriction on the probabilities pjk and may be written 
as 

(2.15) log pji, + pj + -fk + i^jk 



with constraints ^ok = 4'jo = 0. The model (2.13 1 can also be obtained by restricting the log-odds 
ratio table ^° = {i^jk)j.k>a to a linear subspace ^ of M"'^-^^, namely ^ = {X0Y'^\e e M^^^-^^}. 
Hence log-bilinear association models are log-linear models where tp is restricted to a linear space, 
but the parameters pi, . . . , pj and 71 , . . . , 7^4- are not restricted (in order to leave the marginal 
distributions of X and Y unconstrained). 

Example 3: Multivariate linear logistic regression 

Extending univariate logistic regression to the multivariate case, suppose Y takes values in Qy — 
{0, 1, . . . , K}, K > 1. Then ^{Y\X = a;) is a multinomial distribution Afif_|-i(l, 7r(x)) with 
K + 1 classes and probabilities Trk{x) — P{Y — k\X = x) > 0. Using the multivariate logistic 
transformation logit Trk{x) = log{TTk{x)/Tro{x)), the multivariate linear logistic regression model 
is given by 

(2.16) logit TTkix) = 7fc -f i^^fe, k^l, K, 

where x e is as above a vector of formal covariates and jk E 0k ^ are unknown 
parameters. Choosing yo = 0, the log-odds ratio function is 

(2.17) ^{x, k) = x^Ok = SFdhy{k) = {hy{k) (g) iff), 

where = {0i, . . . , 0k) is an Lx x K parameter matrix, and the function hy : fly — > M^^ maps 
A: > to the kth unit vector and hy{0) = 0. The model (2.161 is in fact equivalent to the 
log-bilinear association model (2.17) provided the parameters 0i, . . . , 0k are not restricted (cf. 
Osius 2004 |5J sec. 4.2]). 

Example 4: Multivariate linear regression 

Let Y and X be random vectors and suppose that the conditional distribution of Y given X is 
multivariate normal, 

(2.18) ^{Y\X ^x)^NM^{py{x),^), 

such that the conditional covariance matrix E is nonsingular and does not depend on x. From 
the conditional log-density 

(2.19) logp{y\X [log[(27r)*^-det(S)] + [y - f,y{x)fj:-'[y ~ ^iy{x)]] 
the log-odds ratio function is 

(2.20) ^{x,y) = [py{x) ~ fiyixo)f^-'y. 
The multivariate linear regression model 

(2.21) py{x) ^ a + f3^S; 
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with covariates x and Lx x Ly parameter matrix /3 lias a log-bilinear association 

(2.22) ijix,y)=x^9y 

with parameter matrix = /3I]^^. The conditional covariance matrix E — and hence the parame- 
ter — may be recovered from the regression parameter /3 and the (marginal) covariance matrices 
of X and Y 

(2.23) J: = Cov{Y)- f3^Cov{X)(3, ^ l3[Cov{Y) - l3^Cov{X)/3]-\ 

Note that a linear hypothesis C/3 = is equivalent to the corresponding hypothesis C0 = 0, 



and the latter may be tested using the semiparametric association model (2.20) instead of the 



regression model (2.211 with the additional distributional assumption (2.18 



3 Estimation 

We only give a brief overview of the estimation, for details see Osius, 2009 [6j ch. 4]. For a 
given data set {xi, Ui) with i = 1, . . . , n we want to estimate the association parameter 9 of the 



model (2.6) under unconditional sampling from the joint distribution of {X,Y) and conditional 
sampling of Y given X or vice versa. Not surprisingly the maximum likelihood estimator 
under any of these three sampling schemes may be obtained as a solution of the same estimating 
equation. 

3.1 Unconditional Sampling 

For unconditional sampling the data set {xi,yi) is an independent sample from the joint distri- 
bution of {X, Y). Suppose there are J -I- 1 > 1 different x- values and K + 1 > 1 different y- values 
observed and denote the corresponding subsets of K*^^ and E^^^ by ff^ = {x(o), . . . , a;(j)} and 
~ {2/(0)1 • ■ ■ I ViK)}- If I'jk is the observed frequency of (a;(j), ?/(fc)), then the likelihood is 

J K 

(3.1) LxY = Yl Y[pi^{j)'yik)Y"' = Ly\x ■ Lx 
with a conditional and a marginal likelihood 

K J J 

(3.2) Ly^x = l[l[p{yik)\X^xy~,Y^\ Lx^l[pxixu)Y'+ 

fe=Oj=0 j=0 

where the subscript "-f" indicates summation over the replaced index. The model does not 
restrict the marginal distributions of X and Y and hence the empirical densities with respect to 
counting measure, 

(3.3) P^iVik)) = r+k/n for fc = 0, . . . , if 

(3.4) j5'^(x(j)) = rj+/n for j = 0, . . . , J 

are the usual nonparametric estimators. 

Interchanging X and Y, we split the likelihood as Lxv — Lx\y • Ly- Restricting and 

to measures with finite support and fiy the likelihood Lxy is a multinomial likelihood for 
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the observed (J + 1) x {K + l)-contingency table {rjk). And estimation of 6 is reduced to a 
multinomial model whose probabilities pjk = p{x(^j^,y(^k)) satisfy the log-odds ratio model 

(3.5) log = i^0{x(j},y{k)) = V'jfc(^) for all j and k 

P]OPOk 



with respect to the reference values xq — a;(o) and yo — y{o)- The parametrization (2.1) now 
involves only a finite number of parameters 

(3.6) logpjfc = pj + 7fe + ^Jjkid) - log ^ ^ exp[pj + 7fe + i^jk{0)] 

namely pj = p{x(^j)), jk = l{y{k)) and 9 with po = 7o = 0. Instead of maximizing Lxy^ it is 
typically preferable to maximize either Ly\x or Lx\y using the parametrization of the conditional 



probabilities Pk\j ^Pjk/Pj+ or pj\k ^Pjk/p+k given by and p^ l 
(3.7) ^ogPk\j=lk + i-']k{9)-5j, \ogpjik = Pj +'tpjk{9) - Sk, 

where the parameters 6j, respectively Sk, are determined by the remaining ones. 



3.2 Conditional Sampling 

When sampling is conditional on values for Y taken from fly — {y{o}, ■ ■ ■ , y{K)}j say, then the 
data set {xi,yi) with i = 1, . . . , n is partitioned into K + 1 independent subsamples given by 
the values of j/i, such that each subsample (xi) with yi = y(^k) is an independent sample from the 
conditional distribution ^{X\Y = y{k))- Instead of maximizing the appropriate likelihood Lx\y 
we can equivalently maximize the unconditional likelihood Lxy or even the "reverse" conditional 
likelihood Ly\x- The latter is preferable from a computational point of view, when the nuisance 
parameters 7^ are less than those of Lx\Yi that is, for K < L. A dual argument applies if 
sampling is conditional on values for X taken from i73sr = {^^(o): • ■ • i ^{J)}- 



3.3 Log-bilinear Association 



In the log-bilinear association model (2.6 1, the odds ratios may be written as ipjkiS) = xJOijk 
with Xj = hx{x(^j'f), jjk = f^Y{y{k)) and a parameter matrix 6 € ^^x'^Ly qj. jnatrix notation 

(3.8) tlj{0) ^ xef^ eR'^""^, X = (ijO e M^""-^^, f = (yfe,) e M^^-^^. 

Then (3.6 1 reduces to a log-linear model for the probabilities pjk, 

(3.9) logpjfe + pj+ jk + xj0yk 

induced by the covariates xj, jjk- Hence results by Haberman, 1974 |4J Ch. 2] on the existence 
and uniqueness of maximum likelihood estimators in log-linear models apply. In particular 
the estimator p = (pjk) is unique (if it exists) and the estimator 9 is unique too, provided 
the parameter 9 is identifiable in the log-linear model (3.9). As already noted in example 2, 
identifiability is equivalent to the conditions 

The Ly X X-matrix — (vt , . . . , Vk) has rank Ly and 
(3-10) . , ' 

the Lx X J-matrix X — . . . , xj) has rank Lx- 

This condition will be assumed here throughout. It will be satisfied if the sample is large enough, 
provided the functions hx and hy — and under conditional sampling the values resp. yj/jj — 
are properly chosen. 
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3.4 Log-linear Models for Contingency Tables 

Since estimation of 6 in a log-bilinear association model can be reduced to estimation in a log- 
linear model we now have a closer look at the latter and continue with example 2. 

We now assume that X and Y have finite support fix — {xq, ■ ■ ■ , xj} resp. fly = {yo: ■ ■ ■ , Vk} 
and consider the usual sampling schemes for a ( J -I- 1) x (K + 1) contingency table R = (Rjk) of 
random counts. The expected table will be denoted by fi — (fijk) ~ E(i?). It is important here 
that in all four sampling schemes the I x I covariance matrix Cov{R) with / = {J +1){K + 1) can 
be represented in terms of _D-orthogonal projections onto a suitable linear subspace (cf. appendix 
A) where D — diag{fl} is the diagonal matrix with diagonal fl. Furthermore the unit vector that 
stems from the ( J + 1) x {K + 1) table having a one in the {j, k)th position and zeros otherwise 
will be denoted by ejk- 



Multinomial Sampling 

Here we take an independent sample {Xi, Yi), . . . , (X„, y„) of size n from the joint distribution 
of {X, Y) and the {J+l)x{K + l)-table R = (Rjk) of counts 

Rjk = #{i = 1, ■ • • , n\Xi^Xj, Yi^ yk} 
follows a multinomial distribution 

(M) = with P^iPjk) 

with fijk ~ n ■ pjk- Define !^ — span{e-^^} as the diagonal space that consists of all constant 
vectors in and let be the D-orthogonal projection onto the space then (cf. Franke, 
2010 [2, sec. 2.2]; Habermann, 1974 (1.54)]) 

(3.11) Cov{R)= D-n'^^fF ^ D{I-P§). 



The model (2.131 may also be written as a log-linear model for the expectations jijk 
(3.12) log p,jk =a' + Pj -I- 7fc + x'^Ojjk 

with a' = a -t- log n. 

Poisson Sampling 

Consider now an independent sample (Xi, Yi), . . . , (X^v, Yat) from the joint distribution of 
(X, Y) where the sample size N is an independent random variable having a Poisson distribution 
Pois{v) with expectation v. Then the counts 

R]k = #{i - 1, . . . , I X, = a;^-, = yk} 

are independent each having a Poisson distribution Pois(fijk) with p,jk = vpjk and total expec- 
tation = V. Hence the vector R has a product-Poisson distribution and we get the Poisson 
model 

J K 

(p) j^{R) = n n poH^^3>^) 

j=0 k=0 



with Pjk = pjk/fJ'++ and Cov{R) = D. The model (2.131 may again be written as in (3.12) 
with a' ^ a + log A. 
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Product Multinomial Sampling for Rows 

We now look at sampling conditional on X where for each j — 0, . . . , J independent samples 
Xji, . . . , Xjn. of size Hj are taken from the conditional distribution ^{Y\ X = Xj). The rows 
Rj. — {Rjo, . . . , Rjx) of the counts 

Rjk = #{i = l, Uj \Y, = yk} 

are independent for j = 0, . . . , J each with a multinomial distribution 

^(Rj.) = MK+i{n,, pf) with = P{Y ^yk\X = x,}. 

Hence the vector R has a product-multinomial distribution and we get the product multinomial 
sampling for rows 

J 

(MR) ^{R) = II MK+i{nj, pf), 

with p^jk = nj^ji^ and Cov{R) = diag {{T,j)j^o^ ...^ ,/} is a (/ x /) block-diagonal matrix with 
blocks = Cov{Rj.) — diag{pj.} — nj^fij.fij. and /ij. the jth row of /i. The columns of 
the / X ( J + 1) matrix F = (eb+, ej+) span the row space 1% = span{eo+, ■ ■ ■ , ej+} 

which consists of all vectors arising from (J + 1) x {K + 1) tables with constant rows. Since 
(ej+, £1+) jj = 5ij ■ rij (using Kronecker's 6) the vectors eo+j . . . , ej+ are pairwise D-orthogonal. 

The covariance matrix of i? can also be represented as (cf. Franke, 2010 [2j sec. 2.3]; Habermann, 
1974 H (1.54)]), 

(3.13) Cov{R) ^ diag{{T.j)j} = diag{{diag{pj.} - fJ.jlfJ.j.fJ.J.)j} = D{1 - P§). 



Again the model (2.131 may be written in terms of the expectations as 



(3.14) log Hjk = a + p'j + -fk + xjdijk 

with p'j = pj + \og{nj/pj+). 

Product Multinomial Sampling for Columns 

Let us finally consider sampling conditional on Y where for each = 0, . . . , K we take inde- 
pendent samples Xki, ■ ■ ■ , Xkmk of size from the conditional distribution J2f{X\ Y — yk). 
The columns R.k — (-Rofci ■ • • ? Rjk) of the counts Rjk = #{i = 1, . . . , rrik \ Xi — xj} are 
independent for k ^ 0, . . . , K each with a multinomial distribution. Hence the vector R has a 
product-multinomial distribution and satisfies the product multinomial sampling for columns 

K 

(MC) J^{R) = l[Mj+i{mk, p^k) ^ith p^^^ = P{X ^ Xj\ Y ^ yk} ^ p^k/p+k, 

fc=0 

Pjk = mkP^^j and Cov{R) = diag {{diag{p.k} - p'^lp-ktJ''^k)k=o. k} with p.k the kth column 
of p. The columns of the / x (K + 1) matrix G = {e+o, ■ ■ ■ , s+k) span the column space 
— span{e+Q, . . . , c+k} which consists of all vectors arising from {J + 1) x {K + 1) tables 
with constant columns. Interchanging rows with columns, i.e. looking at the transposed table 



R , leads us back to the product model for rows and (3.131 yields 



(3.15) Cov{R) = D{l-R 
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3.4.1 Log-linear Models for the Expected Table 

In all four sampling schemes above, the expected table ji = E{R) satisfies a log-linear model 

(3.16) Vjk = a + pj + + xJOyk respectively ff = log fl€J^ 
with a linear subspace of M^. 

Viewing xi, . . . , xj and yi, . . . , ijK as "scores" assigned to the rows resp. columns, the above 
model appears as a generalization of the linear-by-linear association model in Agresti, 1990 ^ 
sec. 8.1.1] with vector-values scores instead of scalars. The above model may be rewritten as 

(3.17) T]jk = a + pj + 7fc + with 

(3.18) Zjk^yk®Xj. 

The vector Zj^ of dimension L may be interpreted as an "interaction covariate" associated to 
(j, fc)th cell of the ( J+1) x (A + l)-table and satisfies the constraints £,0 = ^ofc = 0. Although any 



log-linear model is of the form (3.17) it will only represent a log-bilinear association in our sense 



if the "covariate" Zjk has a decomposition (3.181, which guarantees that Zjk does not contain 



any information about the association of X and Y . 

In the Poisson model (P) the parameters a', pj and 7^ are not restricted (cf. example 2) or 
equivalently, the marginal space ^ = -V"^ = span{eo+, • . ■ , ej+, e+o, . . . , e+if } is a linear 
subspace of and this will be assumed from now on. 

Given an observed table r of counts the maximum likelihood estimator (in any of the four 
sampling schemes) p — \i(f) ^ ^ = exp[Jif] of p is the unique solution (provided there is one) 
of the same normal equation 

(3.19) P,^^^P,^r, 

cf. Haberman, 1974 [H ch. 2] who also gives criteria for the existence of the estimate. In 
particular ^ C Jif implies that p and r have the same row and column totals 

(3.20) pj+ = rj+ for j = 0, . . . , J, 
p+k = r+k for fc = 0, . . . , K. 



The odds ratio parameter ^ is a function of 77 resp. p, and will be estimated as the corresponding 

Ijk = 5 J' 



function. Conversely, p is the unique table determined by the log-odds ratios ijjjk — xJOyk and 



the totals and r^k of the observed table for all j and k (cf. Plackett, 1974 jT] sec. 3.4]). 



4 Asymptotic Covariance Matrices 

In this section — which contains the main results of this paper — we derive different representations 
for the (estimated) asymptotic covariance matrix Eg of the estimator 9. Here we assume that Y 
has finite support and show in section 6 how the general case with arbitrary support for Y can 
be reduced to finite support. We first look at log-linear models for contingency tables (example 
2) where X has finite support too. Then we consider the multivariate linear logistic regression 
model (example 3) with arbitrary support for X. Although the asymptotic covariance matrices 
arise from suitable asymptotic assumptions — and are only applicable given these assumptions — 
their estimates can always be computed for a given sample. And — using matrix algebra only — we 
are going to show that the different estimates considered here all result in the same matrix. 



10 



4 Asymptotic Covariance Matrices 



4.1 Log-linear Models for Contingency Tables 



Continuing our discussion in |3.4| we consider a log- linear model given by 77 S with S/' C and 
assume any of the four distribution models (M), (P). (MR) or (MC). The asymptotic normality 
of the estimates \i and f] given by Haberman, 1974 [4, Th 4.4] — for an asymptotic approach 
with fixed cells (i.e. J and K are fixed) and (suitably) increasing expectations //jj. in each cell 
(j, fc) — imply that the asymptotic covariance matrices of /t and f\ are given by 



(4.1) 



(4.2) with 




for the model (M), 
for the model (MR), 
for the model (MC), 
for the model (P) 



> C ^ 



and 



D = diag{jl}. 



In each of the four sampling schemes the projection P^Y is fixed by design, e.g. the row 
sums Yj^ in (MR), and the distribution of Y may be obtained from the Poisson model (P) by 
conditioning upon P^Y = c for a suitable c. To derive the asymptotic covariance matrix Eg of 

6 we use the representation 



(4.3) 



Vjk =a + pj + "fk + i^jk{0) 



with 



Although in a log-bilinear association model zjk is given by (2.14) we derive the following results 



in appendix C.l without this restriction and consider the particular case (2.141 separately. For 
later purpose we consider the compound parameter A = (7°,^) with 7° = (7^) 



fc>o- 



We define 



further the {JK x L) matrix Z° = {zjk)j-k>o, the (/ x JK) matrix C through the columns 
Cjk = (ijk + 6*00 ^ 6jO ^ 6ofe , j,k > and the (/ x K) matrix B through the columns bk ~ eofc ~ ^qo ■ 



Theorem 1 In the log-linear model given by rj d M' and (4.3 1 the asymptotic covariance 
matrix of the estimator X is given by 



(4.4) 

or in block notation 
(4.5) 



^7° ^re 



In particular the asymptotic covariance matrix of 6 is given by 
(4.6) Eg = Z°^C^P^^D-^CZ°~^ 

and does not depend on the space ,yK (which determines the sampling scheme). 

Remark: The above representation of Eg contains in D the vector fl of expectations which 
depends on the sampling scheme. However the estimate p, — and hence corresponding estimate 
Eg of Eg — is the same in the sampling schemes (P), (M), (MR) and (MC) and can be recovered 
from 9 and the row and column totals of the observed table. 
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4 Asymptotic Covariance Matrices 



4.2 An Explicit Representation of 

To get a more explicit representation of the asymptotic covariance matrix Eg in terms of the 



vectors Zji^ we first ehminate the projection in (4.6) and obtain the representation (cf. 
appendix C.2 ). 



Theorem 2 In the log-linear model given by (4.3 I the asymptotic covariance matrix of the 
estimator 9 is 



(4.7) 



E- = {z°'^{c^D-^c)-^z°y^ 



with D = diag{fl} and Z° = {zjk)j.k>a- The matrix C'^ D has for j ^ I and k ^ m the 
following elements 



(4.8) 



iC"^ D ^C)jk.lm = /^OO 



The remark to theorem [T] still applies here. This compact form of Eg is helpful to evaluate the 
influence of the covariates and the estimates on the asymptotic covariance matrix Eg. 

Example (saturated model): For the saturated model ^ = the matrix Z° is 

the identity matrix. Hence 



(4.9) 



Ea = C^D-^C, 



and its estimate Eg can be evaluated from (4.8 1 with ^, replaced by the observed table r. In partic- 



ular for a 2 X 2 contingency table R with VLx = {0, 1} and J7y = {0, 1}, i.e. J = /-C = 1. we get the 
scalar Eg = T\\ -Vr^^ -\-Tyq -\-t'^^ which is well known as the asymptotic variance of the estimator 



of the log-odds ratio parameter Q. 



□ 



In appendix C.3 we derive another representation of DP^^^ and hence of Eg, which will be 
used to prove 



Theorem 3 In the log-linear model given by (4.3 I the asymptotic covariance matrix of the 
estimator 9 can be written in terms of the covariance matrix Covmr{R) = T)P^±^ for the 



sampling scheme (MR), cf. (3.13), E = (e+i, . . . , c+k) md the ( J -I- 1) x (K -\- 1) matrix 
Z = {zjk) as 

(4.10) Eg - [z^CovMRiR)Z - Z^CovMR{R)E{E^CovMR{R)E)-^E'^CovMRiR)Z 

for the sampling schemes (P), (M), (MR) and (MC). 

Again, the remark to theorem [T] applies. This representation has been used to evaluate the 
covariance matrix for the special cases A = 1 and K — 2 (which also apply to linear logistic 
regression as remarked in |4.4|, cf. Franke, 2010 [2, sec. 5.1.3]. 
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4 Asymptotic Covariance Matrices 



4.3 in Log-bilinear Association Models 



In the log-bihnear model (2.141 the matrix Z° is the Kronecker product oiX° = (ij;)j>o.i=i,...L^ 
and Y° = {yki)k>o,i=i,...Ly, i-e. Z° — Y° (E) X°. From the properties of Kronecker's product the 
left inverse Z°~ can be obtained from the left inverses X°~ and of X° and Y° as 

(4.11) z°" = y°"®x°", = 

Theorem [T] applied to a log-bilinear association model gives 

(4.12) Eg = (f °" ® l°")C^P^i:»-iC(f°~^ (g) 1°"^) 
and theorem [2] yields 



Corollary 1 In the log-bilinear model given by (2.141 the asymptotic covariance matrix of 
the estimator 6 is 

(4.13) Eg = {{Y°^ (g X°^){C^D~^C)-^{Y° (g) X°))'^ 

with D = diag{fl}. 



The (J + 1){K + 1) X L matrix Z = (zjk) is the Kronecker product Z — Y ® X and theorem [s] 
leads to 



(4.14) 










(f ^ ® X^) 


{CovmAR) - CovMKiR)E{E^CovMiiiR)E)-^E^CovMniR)) [Y X) 


-1 


with 


Covmr{R) 


— ^^3l^D ■ 





4.4 Multivariate Linear Logistic Regression with Sampling Conditional 
on X 

Consider now the more general case where X is a random vector with arbitrary support, but 
Y still having finite support J7y = {yoi ■ ■ ■ 7 J/if}- We assume a sampling scheme conditional 
on X and choose J + 1 different values = {xq, . . . , xj} of X. For each j = 0, . . . , J an 
independent subsample Yji with i = 1, is drawn from the conditional distribution of Y given 
X = Xj and as before the counts for yk in this subsample are denoted by Rjk = #{« | Yji ~ yk}- 
The resulting distribution model for the contingency table R is the product multinomial sampling 
for rows with conditional probabilities TTjkix = P{Y = yk \ X = Xj} that are specified through 
the multivariate linear logistic regression model 

(4.15) logitj^.ix) = Ik + zjj for j = 0, . . . , J, /c = 1, . . . , X 

with arbitrary covariates Zjk satisfying Zj^ = zok = for all j and all k. Note that the following 



statements not only hold for bilinear odds-ratio models with Zjk — /ly(fc) (g Xj from (2.171 but 



also for the more general model (4.151 
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5 Arbitrary Support of Y and Sampling conditional on Y 



The log-likelihood with respect to A = (7, 9) is 

,7 K 

(4.16) \0gLY\xW^ ^^Rjk^OgTTjklX 

j=0 k=0 

and the score vector U{X) is its gradient 

(4.17) ;7(A) = BxlogLrixiXf 
with covariance matrix given by the second derivative of Ly\x 

(4.18) Cov{U{Xj) - E(-DLiy|x(A)) - -DLLy|x(A). 

It is well known that the inverse of this matrix is — under mild conditions — the asymptotic co- 



C.4 



variance matrix of the estimator A when the total sample size increases. In appendix 
prove a fundamental result that Cov{U{X))~^ coincides with the asymptotic covariance matrix 
Sj^ for A given in theorem [ij where X had finite support. 



Theorem 4 For sampling conditional on X the inverse of the covariance matrix of the score 
vector U{X) is given by 

CoviUiX))-' = 

with Sj^ from theorem^ 



Hence the estimate A and its asymptotic covariance T,-^ can be determined as if X had finite sup- 
port Q*^ . In particular any statistical software package for multivariate linear logistic regression 
or log-linear models can be used to compute A and the estimate Y.^ of Y.^ as well as to perform 
further statistical analysis, like tests and confidence interv als. Fur ther more the representations 



of the estimated asymptotic covariance matrix Eg given in 4.1 and 4.2 apply here too. 



5 Arbitrary Support of Y and Sampling conditional on Y 

So far we have assumed that Y has finite support and we now consider the general case with 
arbitrary support for Y and X. Although the maximum likelihood estimate 9 of the associa- 
tion parameter 9 may be obtained by maximizing the likelihood for conditional or unconditional 
sampling, the stochastic properties of the latter depend on the sampling scheme. Let us consider 
sampling conditional on Y — which can be preferable from a practical point of view — and sum- 
marize properties of the estimate 9, for details see Osius, 2009 IB] sec. 5-7]. It is convenient to 
represent the sample as a compound vector X = (Xki) of independent random variables indexed 
by fc = 0, . . . , K and i — 1, . . . , m^. Using the notations from [X2] without the parentheses 
in ?/(^-) and X(^j^, each Xki is distributed as Xk ~ J^'{X\Y = y^). Let Rjk = \ Xki = Xj} 
denote the frequency of Xj in the subsample (Xki). Then _R_|_fc — rrik is fixed and the empirical 
distribution on Vty — {yo, ■ ■ ■ , yx} is given by the proportions fhk — nik/n, where n = m+ is 
the total sample size. Replacing in the joint distribution P of {X, Y) the marginal distribution 
of Y by the empirical distribution ( |3.3[ ) yields a joint distribution P* on M*^^ x fly given by the 
density p* with respect to the product of vx and the counting measure on fly '■ 

p*{x,yk) ^ fhk ■ p{x\Y = yk) for all a;, fc. 
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5 Arbitrary Support of Y and Sampling conditional on Y 



Denoting the conditional density of X by 

fhk -pixlY = Uk) 



(5.1) pl{x)=p*{yk\X^x) 



equation (2.3l yields the parametrization logp^(x) = 7^ + ipeix,yk) — S*{x) with nuisance pa- 
rameters 7^ = 7*(j/fc) and d*{x) = log[X]; exp(7;* + ipgix^yi))], hence 

(5.2) pUx)- -Mil + Mx,y^)) 



T,i^^Hl! +'^e{x,yi))' 



From the constraints (2.2 1 we obtain 7g = 0, and the nuisance parameter is 7* = (7]^, . . . , 7^) £ 
M.^ . Finally, the logarithm of the conditional likelihood Ly\x may be written in terms of the 
compound parameter vector A = (7* , 6) E ]R^+^ : 

K rrik 

(5.3) ;(A) =\ogLY\x = 5]^logp^(Xfe,). 

fc=0 i=l 

The first and second derivative of /(A) are denoted by Da/(A) and D^;^/(A). 

Let us briefly resume the asymptotic properties of the estimator A = (7*,^). The asymptotic 
approach assumes that set VLy = {yoi • ■ • ; J/if} of conditional values will remain fixed while all 
subsample sizes ruk tend to infinity with fixed ratios fhk = mk/n > for all n and k. Hence 
the nuisance parameter 7* and the conditional densities pl.{x) — p*{yk\X — x) do not vary with 
n. The asymptotic unique existence of the estimator, the strong consistency of the sequence A*^") 
and its asymptotic normality can be derived under reasonable conditions. More precisely, using 
a block notation for the inverse of the information matrix 1(A) = — E(D^^^(A)), i.e. 

^^■^^ ^^^^ -[im-')e, mr^u 

the asymptotic distribution is given by (cf. Osius, 2009 [51 Thm. 5]) 

(5.5) Vn[§^''^ -0]-^ N{0,{i-^{X))gg) with i{X) = n-^I{X). 

n— J-oo 

The matrix 

(5.6) J(A) = -DL/(A) 
is a consistent estimator of 1(A) and hence 

6 N{e,{3-\X))ee) with 

(5.7) 

iJ-\X))ee = (J(A)ee - J(A)e^(J(A)^^)-iJ(A)^e) 
using a well known result for the inverse of a partitioned matrix: 

i5-8) ^-\g h) ^ ^ -y -N-^GL-^ 

with N ^ H - GL-Hl. 
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6 Applications 



Note that for an observed data set, the estimated covariance matrix (J ^{X))e0 — i.e A is replaced 



in (5.7 1 by A — is identical to the corresponding matrix under sampling conditional on X (instead 
of Y) . In this sense the estimate 6 and its estimated asymptotic normal distribution are invariant 
under sampling conditional on either Y ot X. Hence asymptotic inference (i.e. tests or confidence 



regions) for the association parameter 6 based on the asymptotic distribution (5.7 1 of the estimate 
9 is invariant under both conditional sampling schemes, too. 

For an observed table (rjk) the matrix J (A) may be computed as if sampling had been conditional 
on X (instead of Y). However, for sampling conditional on X (4.181 and theorem |4] imply 



(5.9) J-i(A) = S^ 

with from the remark to theorem 1 In particular the estimated asymptotic covariance matrix 
of 6 coincides with the estimate of ( |4.6[ ) 

(5.10) {J'\X))ee = Z°^C^P^^D-^CZ°^'^ with D = dmg{/x}. 

We note again, that the table fi is uniquely determined by the row and column totals of the 
observed table (rjk) and the estimate 9. 

Hence, the estimated asymptotic covariance matrix of 9 for sampling conditional on Y is the same 
as for the usual fixed cells asymptotics where X and Y had finite support. And interchanging 
X and Y yields the same result for sampling conditional on X. 



6 Applications 

This section deals with some applications of our theoretical results. The covariance matrix 
(for which we have given several representations) is not only needed to analyze a given sample by 
means of log-bilinear association models but also to investigate the properties of such an analysis, 
mainly the power of the tests involved and the calculation of the necessary sample size to achieve 
sufficient power. We first address power and sample size issues for unconditional and conditional 
sampling. And finally we have a closer look at generalized linear models with canonical link 
(in particular linear and log-linear models) and discuss the advantage of using the more general 
log-bilinear odds ratio models instead 

6.1 Power and Sample Size Issues 

Suppose we wish to test a linear hypothesis Hq : Q9 = for a given matrix Q against the 
alternative H : Q9 ^ using the usual test based on the asymptotic normal distribution 
of Q9. As a typical example, suppose X — {X',X") consists of two blocks and we wish to 
test the hypothesis Hq that X" and Y are independent, which is often of primary interest. 
Using separate functions x' ~ h'-^{x') and x" = h'x{x") such that a; = {x',x") and the block 
notation 9 = {9' , 9"), the above hypothesis of independence is equivalent to the linear hypothesis 
Hq : 9" = 0. If in addition Y — [Y' , Y"). a similar argument shows that the hypothesis "X" and 
Y" are independent^ is a linear hypothesis too. 

The asymptotic power of the test of Hq : Q9 = may be computed from the covariance matrix of 
the estimator 9 using one of the above representations of E^. We first look at contingency tables, 
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6 Applications 



i.e. both X and Y have finite support, and consider unconditional and conditional sampling 
separately. 

Unconditional (multinomial) sampling for contingency tables: In the multinomial sam- 
pling (M) the vector expectations is given hy fl — np and from corollary [l] we get 

(6.1) = n{Y°^ ® X°^){C^diag-^{p}C)-HY° ® X°) 



using (4.8 1 to evaluate C^diag^^{p}C. The matrices X° and Y° contain only the known values 
Xj and yki but the joint density p additionally depends on 9. For a given value 9' of interest from 
the alternative we wish to compute the asymptotic power of the test either retrospectively (i.e. 
after the sample has been drawn) or prospectively to obtain an optimal design for the study. 
Since X° and Y° are already known we only have to find the joint density p' corresponding to 
9' and the marginal probabilities pj^ and p+k- This unique p' can be obtained by an iterative 
proportional fitting procedure (cf. Sinkhorn, 1967 [^). Alternatively, p' can be found by fitting 



the log-linear model (2.131 under the constraint 9 = 9' to an "observed" table r' with marginals 



r'j^ = Pj+ and r^j, = p+k, e.g. r^-^ = pj+p+k- Using p' instead of p in ( |6.1| yields 

(6.2) ^ n {Y°^ (g, X°^){C^diag-^{p'}C)-\Y° (g> X°). 

from which the asymptotic power of the test can be obtained. □ 

Conditional sampling for contingency tables: Sampling conditional on X leads to product 
multinomial sampling for rows (MR) where the expectations are given by fi'jf. = njp^^, with 

conditional probability p^j, — p'j^/p'jj^- Using the total sample size n — rij^ and the relative 
sample sizes fij = rij /n — which are typically fixed in advance, e.g. Tij — 71/ [K + 1) in a balanced 
design — we get /i' = npf* with p'*^, = fijp'^^jp'^j^ and hence 

(6.3) Ylf^ (f°^ ® X°^){C^ diag-^{p'*}C)-^{Y° ® X°). 

The density p'* arises from p' by replacing the marginal distribution of X with the empirical 
distribution of X given by the proportions fij. Note however, that the marginal distribution of 
Y changes when passing from p' to p'*, i.e. p'^^, = p+fc differs from p^j,. Consequently the joint 
distribution p'* is not determined by 9' . hj and p+/j (for all j, fc) alone, but still depends on the 
marginal distribution of X although sampling is conditional on X. 



The matrix (6.3) — and hence the power of the test — depends not only on the total sample size. 



but also on the proportions Uj which may be chosen to maximize the power. 

And for sampling conditional on F, i.e. the model (MC), we get the same representation 



(6.3) with n = m_|_, = m,k/n and p'*^, = ^kp'jklp\k ■ Again the power of the test may 
be maximized with respect to the proportions mfe. And if conditional sampling on X or y 
are both possible then one can choose the sampling design with the highest power for the 
test. □ 

To determine the total sample size n necessary to achieve a wanted power, we only have to 



increase n in (6.2 1 resp. (6.3) until the given power is reached. The above consideration only 
apply when both X and Y have finite range. However the distributions of X and Y can always 
be approximated by distributions with finite support, e.g. by grouping or rounding. And using 
the discrete approximations to compute the power should be sufficiently accurate for practical 
purposes. 
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6.2 Generalized Linear Models With Canonical Link vs. Log-Bilinear Odds 
Ratio Models 

In example 1 we have already seen that generalized linear models with canonical link function 
are log-bilinear odds ratio models. However the latter models do not assume that the conditional 



distributions belong to the exponential family (2.7 1. We now explore in more detail the rela- 



tionship between these regression and association models. Keeping the notation from section [2] 
we suppose that Y is univariate with support fiy C M. We consider the log-bilinear odds ratio 
model with respect to the identity map hy — id on M — e.g. y = y — 

(6.4) tpg{x,y) — x'^Oy for all x, y. 

This model does not restrict the marginal distributions and of X and Y. But we 
assume that Cov{X) is positive definite and < ay — Var(Y) < oo which guarantees for any 



9 G M the existence of a unique joint distribution P with (6.4) and marginals P and P 



The logarithm of the conditional density ( |2.3| of Y given X = x may now be written 

\ogp{y\x) = 7(2/) + ry - k(t) with 



(6.5) ^-"^ 



and 



k{t) = log / exp(7(y) + Ty)dvY{y). 



Although this density looks like a member of an exponential family with canonical parameter r, 
it need not be a density for any value of t other than x^9. However the expectation and variance 
of the conditional distribution are still given by the derivatives of k 

Var{Y\X ^ x) ^ k" {t) = k"(£^0) =: al{e) 

provided the following regularity condition holds which allows interchanging differentiation with 
integration 



Dr / exp(7(y) +Ty)dt/Y(y) = / [D^ exp(7(y) + ry)] di/y (y), 
(6.7) ^ ^ 

Drr / exp(7(y) +Ty)di^Y(y) = / [D^^ exp(7(y) -|- ry)] di^y (y). 



The derivative of the conditional expectation with respect to 9 is 

(6.8) ti'M = ^^"{i'^eyx^ = ^l{0)i^- 

We will now see how the linear resp. log- linear or logistic regression model emerges from the 
association model when the respective structure for the conditional variance is assumed. 

6.2.1 Linear Regression 

Now Y has a continuous distribution with support ^ly = M and we assume that the conditional 
variance is constant and positive 

(6.9) ct2(6I)=ct2>0 for all X and 61, 
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which is a common assumption in linear regression models. Then the derivative fj,'^{0) does not 
depend on 9 and hence the conditional expectation may be written as 



(6.10) 
(6.11) 



for all X with 



and some constant /Sq S M. Conversely, the linear model (6.10) and (6.111 together imply (6.9) 
in view of (6.8). From (6.9) and (6.10) one easily obtains 

E(y) =/3o + /?^E(1) 



.12) 



--al - l3^Cov{X)l3 = a\ 



2 

Cov(X) 



using the norm induced by Cov(lC) (cf. appendix A). This in turn gives the odds ratio param- 
eter B in terms of the regression parameter /3 and the second order moments of the (marginal) 
distributions of X and Y 



.13) 



which coincides with (2.23 1 for univariate Y. In order to recover the regression parameter /? from 
9 — given ay and Cov{X) — we consider the norms 



.14) 



\Cov{X) 



Cov{X) 



Cov{X)) 



The function /(u) = u/{aY—u^) defined for v? ^ cr^ with derivative f'{u) = (cry +?i^)/(CTy— w^)^ 
is strictly increasing for < u < cry from /(O) = to its left-sided limit /{(yy — ) = oo. Hence / 
has an inverse /^^ : [0,oo) — > [0, cry) given by 



(6.15) 







2 " 



Vl + 4^2^ 



for V — 0, 
for V > 0. 



Now we obtain ||/3||coi;(x) — f ^i\\^\\cov{x)) from (6.14), which inserted in (6.13) yields (3 in 
terms of 9 and the second order moments of the (marginal) distributions of X and Y 



.16) 
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From the above discussion the log-bilinear association model ( |6.4| appears as a generalization of 
the classical linear model which — in addition to (6.9) — assumes the conditional distribution to 
be normal 

(6.17) ^{Y\X ^x)^N{fi^{9),a^). 

Furthermore, the linear model only leaves the marginal distribution of X unconstrained but 
introduces a connection between the marginal distributions of X and Y, e.g. through (6.12). 



As already mentioned, using the association model (6.4) instead of the regression model (6.10) 
with (6.9) also allows asymptotic inference about f3 — for sampling conditional on either X or 
Y — because 9 and /3 only differ by the positive (unknown) constant cr^. 

Furthermore a one-sided hypothesis Hq : < for a given vector c is equivalent to Hq : c^9 < 
and a linear hypothesis Hq : Q(3 = for a given matrix Q is equivalent to Hq : Q9 — 0. To 
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7 Resume and Discussion 



compute the power of the corresponding test for a given value /?' under the alternative we have 
to assume realistic values for the variance cry and the covariance matrix Cov{X) in order to get 
the corresponding values of a'^ and 6' from (6.12 1 and (6.11 1 for /3 
in section |6.1| can be applied to obtain 



Then the considerations 
for the intended sampling scheme — the corresponding 
covariance matrix which allows the computation of the power and the necessary sample size 
to achieve a given power. 



Using the log-bilinear odds ratio model determined by (2.4 1 and (2.51 has two advantages over 



the usual linear regression model given by (6.9 1 and (6.101. First, no assumptions about the 
conditional distribution of Y given X are needed and in particular, (6.9) need not hold. And 



second, sampling may be conditional on Y instead of X, which may be preferable from a practical 
point of view or to achieve a higher power. 

However, even if the linear model holds and if the marginal variance ay and the covariance 
matrix Cov{X) are known — or consistent estimates are available, e.g. from previous studies — 



then a plug-in estimator (3 of (3 can be obtained from (6.161 and 9. Furthermore the asymptotic 



normality of 6 provides the asymptotic normal distribution of /3 by the delta-method. 



6.2.2 Log-Linear Regression 

We now consider the case where Y is discrete with support fly = N U {0} and assume 
(6.18) alie) = n^Xe) > for all x and 9, 



i.e. the Poisson variance function applies. Then by (6.6 1 k'{x^6) — k"{x^9) for all x and 
9 — which in turn implies k'{x^9) = exp(/3o + x'^9) + c for some constants (Sq, c E R. If the 
expectation fj-x{9) is allowed to take any positive value, then c must be zero and we get the 
familiar log-linear model 



.19) 



with /3 ^9. 



Hence the association model (6.4) appears as a generalization of the log- linear model which — in 



addition to (6.18) — restricts the conditional distributions to Poisson distributions 



..20) 



^{Y\X = x) = Poisinx{9)). 



Since P = 9 asymptotic inference about the regression parameter /3 of the log- linear model (6.19) 



may also be obtained from the more general association model which imposes no restriction on 



the conditional distribution of Y given X, e.g. (6.18), and where sampling may be conditional 
on either X or Y. 



6.2.3 Logistic Regression 

Looking finally at a binary random variable Y with support Qy = {0, 1} we only note — as 
already mentioned in example 3 — that the (univariate) logistic regression model is equivalent to 



the association model (6.4), so that no new aspects arise by using the latter model. 



7 Resume and Discussion 

For a pair of random vectors {X, Y) we have looked at semi-parametric association models with 
log-bilinear association — which include multivariate linear logistic regression, log-linear models 
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7 Resume and Discussion 



for contingency tables as well as univariate and multivariate linear regression models. Given a 
sample (xi,yi), i = 1, . . . , n, the statistical inference for the odds-ratio parameter 9 (i.e. test 
and confidence regions) depends on the distribution of 9 which typically is asymptotic normal 
and its covariance has to be estimated. The asymptotic approaches depend on the sampling 
scheme (conditional on X resp. Y or unconditional) and differ if Y resp. X — or both — have 
finite support. We have shown however, that the estimated asymptotic covariance matrix of 9 
is invariant against the usual sampling schemes and does not depend on the support of X or y 
being finite or arbitrary. 

More precisely, we first considered the case where X and Y both have finite support. Then 
the log-bilinear odds-ratio model is a log-linear model for the expectations of the corresponding 
contingency table and by theorem [l] the estimate Eg of the asymptotic covariance matrix 
is invariant against the common sampling schemes. Explicit representations for computing the 
matrix are given in theorem |2j |3] and corollary |T] Allowing arbitrary support for X but 
finite support for Y, the log-bilinear association model is a multivariate linear logistic regression 
model. Our theorem [4] implies that in this case the asymptotic covariance matrix of 9 coincides 
with Eg (where X had finite support too). 

To cover the general case with arbitrary supports of X and Y we looked at sampling conditional 
on Y and an asymptotic approach where the set of conditioning values remains fixed. Combining 
the findings here with our earlier work we found that for a given sample the estimated asymptotic 
covariance matrix of 9 coincides with the one computed for the observed contingency table under 
fixed cells asymptotics. And a dual result holds for sampling conditional on X instead of Y. 

Hence for asymptotic inference about the association parameter 9 one may assume any of the 
above sampling schemes and the statistical analysis of the sample can proceed as if both X and 
Y have finite support. Probably the most simple approach is to analyze the observed contingency 
table containing the counts r^/^ for all observed combinations of x-values and y-values using a 
log-linear model. Then an estimate of E^ is obtained from corollary [T] by using the estimate 

D = diag{jl} instead of D. As a first application we have explained how our results allow to 
compute the asymptotic power for test of linear hypothesis about 9 and to determine the sample 
sizes to achieve a given power. Furthermore we have recovered the linear and log-linear regression 
model for univariate Y from a more general log-bilinear association model. 

Semiparametric association models do not restrict the marginal distributions of X and Y. But 
more important, statistical inference about the association parameter 9 is possible for conditional 
sampling on either X or Y. If X is considered as an "input" and Y as an "output" then sam- 
pling conditional on X is a natural approach. However in certain situations sampling conditional 
on Y may be advantageous, e.g. takes less time or money. For finite Y, for example, sam- 
pling conditional on Y is very popular in epidemiology (case-control-studies) and econometrics 
(choice-based samples) — mainly because of their retrospective character. But as we have shown, 
sampling conditional on Y may also be used if Y has arbitrary support. In particular, using 
for univariate continuous Y the more general log-bilinear odds-ratio model instead of the linear 
regression model, allows asymptotic inference even for the regression parameter when sampling 
is conditional on Y. 

If sampling conditional on X or y are an option, then the sampling scheme can be chosen to 
maximize the power of the test concerning the hypothesis of primary interest. This is well known 
for binary X and Y in the context of 2 x 2-tables and our results allow similar considerations for 
arbitrary X and Y . 
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B Kronecker Products 



Appendix 

In appendix [X] and |B] we summarize some definitions and results from linear algebra, which are 
used freely throughout the paper without explicit reference. In appendix [C] the proofs of the 
theorems are given. 



A Inner Products and Orthogonal Projections 

Any positive-definite symmetric (/ x /) matrix D induces an inner product on the vector space 
given by (a, b) — Db, and orthogonality with respect to this inner product will be called 
D-orthogonality. denoted by J.^). 

Consider a linear subspace ^ of and a matrix X whose columns form a basis of The 



Z?-orthogonal projection 
(A.l) 

Some basic properties are 

(A.2) 
(A.3) 
(A.4) 



—J- ^ onto ^ can be represented as an / x / matrix 



pD pD _ pD 

{P^f^DP^D-^ 
{P^fDP^^^DP^^. 



The _D-orthogonal projection onto the _D-orthogonal complement jV^'^ — D ^[N^] satisfies 



(A.5) 
(A.6) 



K-o = D-'P,^'. D 



with the identity matrix I. For another linear subspace 



(A.7) 
(A.8) 



c 

^ pD pD _ pD _ pD pD 



it holds 



B Kronecker Products 



The Kronecker product of the two matrices, denoted hy A ® B is, defined as the partitioned 
matrix (cf. Graham, 1981 [3J) 



(B.l) 



A®B 



( aiiB ai2B 

021-6 022^ 

\amiB am2B 



a2nB 



^mnB J 
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C Proofs 



Some basic properties are 

(B.2) {aA)<»{l3B) = {a(3){A<»B) 

(B.3) iA + B)®C = A(g)C + B(E)C 

(B.4) A(g){B + C) = A® B + A(S)C 

(B.5) {A (x) B f =^ A'^ B^ 

(B.6) {A(g) B){C (E) D) = AC ® BD 

(B.7) {A (X) B)-^ = A-^ (X) B-'^ 

(B.8) {AYB) = (B^ (g) 

C Proofs 

C.l Proof of Theorem [T] 



(4.3) restricts -0° to the linear subspace ^ = {Z°9\0 e M^} i.e. 



(C.l) V'° = ^"fi* 

where Z° is assumed to have rank L. The parameters a, pj, 7fe and ipjk are Hnear functions of 
the log-expectation 77 and in particular 

(C.2) 7fc = T]Qk - 7700 

(C.3) -ipjk = Vjk + Voo - Vjo - Vok- 

Then 7° and ip° are given by 



The columns of B are orthogonal to the row space ^ and hence 
(C.5) B'^P§ = 0. 

The columns of C span the orthogonal complement of the marginal space ^ and thus 

(C.6) P^D-^C = 0, C^Pj^ = 

since o/K C J^. The parameter A — (7°, 6*) is linked to (7°,?/;°) in the following way 



Since the (JK) x L matrix Z° has rank L and a left inverse = we get 

(C.8) A = (^^ (^X ) = (^^J (^^j.) (^o^^t) 

Hence the asymptotic covariance matrix of the estimator A can be derived from Sjj as 
(C.9) = (^o^t) {B, CZ°-^) . 
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C Proofs 



Using the block notation 
(C.IO) 



^7° ^7°0 



we get the asymptotic covariance matrix of as 



(C.ll) 



□ 



C.2 Proof of Theorem [2] 

The space of the log-expectation Jif can then be decomposed into the direct sum 

(C.12) jf = ^®ir' with ir' = jfn^-L". 

is the /^-orthogonal complement of ,5^ in Applying the orthogonal projection P^±^ on 
yields 

(C.13) P^^^[Jif] = ^' 

and hence the columns of the (/ x L)-matrix 
(C.14) V = P^^^Z 

span Using the representation 

(C.15) P§, = V{V^DVy^V^D 



and (C.6) we get 

(C.16) C^P^D-^C = C'^P^D-^C + C^P^,D-^C = C^V{V'^ DVyW^C. 

Since the columns of C are elements of we get 

(C.17) Z^C= Z'^P^I"C= Z^{P;^^,fC^V^C 

For j,k>0 the jk-th row of C^Z 

(C.18) {C^Z),k = cJfcZ = e-;:^^ + eo^Z - e^^Z - e,lZ - 

holds because ZjQ — ZQk = and therefore 

(C.19) C'^V^C^Z = Z°. 



With (C.16 1 and (C.14) this leads to different representations of Eg from (C.ll I 



(C.20) 



Eg= Z°-{C'V{V' DV)-'V'C)Z 

= iZ^iP^.J^DP^.^Z)-' 
= (Z^Pj^TDZ)-' 

^ {z^c{c^D-^c)-^c^D-^Dzy^ 



and thus with (C.19I the representation (4.7) is obtained. 



□ 
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C Proofs 



C.3 Proof of Theorem |3] 



The marginal space 3" can be decomposed into = where is spanned by the columns 

of the matrix E = (e+i, . . . , e+x)- The columns of the (/ x K) matrix 



(C.21) 

span the D-orthogonal complement 
(C.22) 
leads to 
(C.23) 



S^P, 



D 



E 



of ^ within The direct decomposition 



Z'^DP^^^ Z = Z'^DZ - Z^DP^Z = Z^DP/^s_ 



,Z - Z'^DPj^nZ. 



Since Eg is invariant against the underlying distribution model we will assume for the rest of 
the proof the product multinomial sampling for rows. The rows are independent of each other 



and we know that Covmb.{R) = ^^S-^d (*^^- (3-13)) where the index MR refers to the sampling 
scheme. The D-orthogonal projection on "^"can now be specified as 

DP^„ = DS{S'^DS)-^S'^D 
(C.24) = DP^^^E{E^DPg^^E)-\DP^^^Ef 

= C0VMR{R)E{E^C0VMRiR)Ey^E^C0VMRiR)- 

Together with (C.23 1 we obtain from ( |4.7[ ) 

Eg ^[Z^DP§^,Z]-' 
(C.25) = [Z^DPg^^Z- Z^DP^.Zy' 

Z^CovMKiR)Z - Z^CovMn{R)E{E^CovMKiR)E)-^E^CovMniR)Z 

□ 



C.4 Proof of Theorem g] 

We first determine the matrices Cov{U{\)) and Yi-^ with E = (e+i, 
Zjk being the rows of Z. The score vector may be written as 



(C.26) 

Hence 
(C.27) 



t/(A) = 



UeiX) 



■jk 



jPfk) 



fe=l, 



Cov{U{\)) = 



, c+k) and Z = {zjk)jk, 



K = 



E^iR-fi) 
Z^iR-fl) 



E^Cov{R)E E'^Cov{R)Z\ 
Z^Cov{ll)E Z'^Cov{ll)z) 



The matrix Et has a block representation (4.51 and we know from (4.4|, (C.5l and (C.lll that 



E. 



(C.28) 



[B^P,^-B^P§]D-^B 



{[B^P^-B^P§]D-^CZ^-^r 

B^P^D-^B 
{B^P^D-^CZ"-^)^ 



[B^PD 



BTpg]D-^CZ 



3-T 



Z°-C^P^D-^CZ°-'^ 



B^P^^D-^CZ°-^ 



Z°-C^P^D-'^CZ° 



-T 
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C Proofs 



Each of these blocks will be determined separately similarly to (C.16 1. First we obtain B E = 1k 
since 







, for fc > 0. 
, for i > and i ^ k. 



It follows from (|C.5l 
(C.29) 



B^D-^Cov{R)E = B'^il - P§)E = B'^E - B^P^E = I 



K- 



The D-orthogonal decomposition (|C.12|) as well as (|C.6|, (C.15) and (C.19I lead to 

^-^= b'^pU,d-^cz°-^ 



Y.,,o a = B^P^D-^CZ°^'^ + B^P§^,D-^CZ° 



And the Z3-orthogonal decomposition ^ M 



" (C.22I together with (C.5), (C.15 1, (C.24) 



and (C.29 1 yields 



= B'^P^D-^B + B' P^.,D-'B = B' P^D-'B + B' P^„D-'B + B' P^,D-'B 
= B'^P^„D-^B + B'^P§,D-^B ^ B^P^„D-^B + B'^ViV'^DVyW^B 
= B^D-\Cov{R)E{E^Cov{R)E)-^E^Cov{R))D-^B + B^V{V^ DVyW^ B 
{E'^Cov{R)E)-^ + B^V(y'^DV)-W^B. 



r,D n-1 



r,D n-1 ; 



Using (4.7 1 we can summarize this into 

7T 



(C.30) 



^ _ ({ETCov{R)Ey^ +B'^V{V^DV)-^V^B B'^V{V^DV)-^ 
' ~ ' [V^DVy^V^B {V^DV)-^ 



To prove the theorem we further examine ( |C.30[ ). The term (V'^ DV)-^ is known from previous 
considerations. We now have a closer look at the remaining term V^B. Since the first K + 1 
rows of Z are equal to zero and the first K + 1 rows of B are the only rows of B with entries 
non-equal to zero we get Z^B = 0. From (|C.14|), dOSl), (|C.24| and (|C.29| it follows 



(C.31) 



y^B= Z'^DP^-^^D^^B = Z'^DD^^B - Z'^DP^D^^B 

= - Z'^D {P§ + P^„) D-^B = -Z^{B^P§ f - Z^DP^„D-^B 

= - Z'^DP^„D-^B = -Z'^Cov{R)E{E^Cov{R)E)-^E'^Cov{R))D-^B 

= - Z^Cov{R)E{E^Cov{R)E)-\ 



After determining all components of "E-^ we are going to invert Cov(U{X)) using (5.8l. For 
A = Cov{U{\)) = 



E^Cov{R)E E^Cov{R)Z 
Z^Cov{li)E Z'^Cov{R.)Z 



we compute A ^ ~ Cov{U{X)) ^ and let 



L= E^Cov{R)E, 



M= E^Cov{R)Z, 



G= ]VP 



H= Z^Cov{R)Z. 
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Then (C.25I, (4.7) and (C.31) lead to 



N ^ H - GL-^M = Z'^Cov{R)Z - Z^Cov{R)E{E'^Cov{R)E)-^E^Cov{R)Z 

6 

-L-Hl = - {E^CoviR)E)-'^E^Cov{R)Z = B'^V 
-GL-^ = - Z^Cov{R)E{E^Gov{R)E)-^ = B 

and accordingly 

L-^ + L-HlN-^GL-^ = {E'^Gov{R)E)-^ + B'^V {V^ DV)~^V^ B . 
Summing up, (5.8) and ( C.30| yields 



CovdUX^)-'- fiE^Cov{R)Er^+B^V{V^DV)-^V^B B^ViV^ DV)-^\ _ 



□ 
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